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Microbial Genome Analyses: Global Comparisons of 
Transport Capabilities Based on Phylogenies, 
Bioenergetics and Substrate Specificities 

Ian T. Paulsen, Marek K. Sliwinski and Milton H. Baler, Jr* 

Department of Biology VVc ha\ c conducted genome sequence analyses of seven prokaryotic 

University of California at San microorganisms for which completely sequenced genomes are available 
Diego, La Jolla {Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Bacillus subtilis, 

CA 92093-0116, USA Mycoplasma genitalium, Synechocystis PCC6803 and Methanococcus jan- 

naschii). We report the distribution of encoded known and putative poly- 
topic cytoplasmic membrane transport proteins within these genomes. 
Transport systems for each organism were classified according to (1) 
putative membrane topology, (2) protein family, (3) bioenergetics, and (4) 
substrate specificities. The overall transport capabilities of each organism 
were thereby estimated. Probable function was assigned to greater than 
90% of the putative transport proteins identified. The results show the 
following: (1) Numbers of transport systems in eubacteria are approxi- 
mately proportional to genome size and correspond to 9.7 to 10.8% of 
the total encoded genes except for H. pylori (5.4'X,), Synechocystis (4.7%) 
and M. jannaschii (3.5'Xi) which exliibit substantially lower proportions. 

(2) The distribution of topological types is similar in all seven organisms. 

(3) Transport systems belonging to 67 families were identified within the 
genomes of these organisms, and about half of these families are also 
found in eukaryotes. (4) 12% of these families are found exclusively in 
Gram-negative bacteria, but none is found exclusively in Gram-positive 
bacteria, cyanobacteria or archaca. (5) Two supcrfamilics, the ATP-bind- 
ing cassette (ABC) and major facilitator (IVIF) supcrfamilics accoimt for 
nearly 50% of aU transporters in each organism, but the relative represen- 
tation of these two transporter types varies over a tenfold range, depend- 
ing on the organism. (6) Secondary, pmf -dependent carriers are 1.5 to 
threefold more prevalent than primary ATP-dependent carriers in £. coli, 
H. influenzae, H. pylori and B. subtilis while primary carriers are about 
ti^'ofold more prevalent in M. genitalium and Synechocystis. M. jannaschii 
exhibits a slight preference for secondaiy carriers. (7) Bioenergetics of 
transport generally correlate with the primaiy forms of energy generated 
via available metabolic pathways but ecological niche and substrate avail- 
ability may also be determining factors. (8) All organisms display a simi- 
lar range of transport specificities with quantitative differences 
presumably reflective of disparate ecological niches. (9) M. jannaschii and 
Synechocystis have a two to threefold increased proportion of transporters 
for inorganic ions with a concomitant decrease in transporters for organic 
compounds. (10) 6 to 18% of all transporters in these bacteria probably 
function as drug export systems showing that these systems are prevalent 
in non-pathogenic as well as pathogenic orgariisms. (11) All seven pro- 
karyotes examined encode proteins homologous to known channel pro- 

*Corresponding author teins, but none of the channel types identified occurs in all of these 
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organisms. (12) The phosphoenolpyruvate: sugar phosphotransferase sys- 
tem is prevalent in the large genome organisms, £. coli and B. subtilis, 
and is present in the small genome organisms, H. influenzae and M. geni- 
taliuni, but is totally lacking in H. pylori, Synechocystis and M. jannaschii. 
Details of the information summarized in this article are available on our 
web sites, and this information will be periodically updated and cor- 
rected as new sequence and biochemical data become available. 

© 1998 Academic Press Limited 

Keywords: Genomes; transport; cytoplasmic membrane; energetics; 
phylogeny 



Introduction 

The last three years have seen impressive pro- 
gress in the sequencing of entire genomes of both 
prokaryotic and eukaryotic free-living organisms. 
As of October, 1997, the complete genomic 
sequences for Haemophilus influenzae (Fleischmarm 
et ah, 1995), Mycoplasma genitalium (Fraser et ah, 

1995) , Methanococcus jannaschii (Bult et al, 1996), 
Synechocystis PCC6803 (Kaneko et al, 1996), Myco- 
plasma pneumoniae (Himmelreich et al, 1996), Sac- 
charomyces cerevisiae (Goffeau et al, 1997), 
Escherichia coli (Blattner et al, 1997), Helicobacter 
pylori (Tomb et al, 1997) and Bacillus subtilis (Kunst 
et al, 1997) were publicly available. The wealth of 
data from these and forthcoming genome scqi-ien- 
cing projects are likely to promote a revolution in 
the biological sciences. However, the impact result- 
ing from the availability of these data is not likely 
to be felt until systematic analyses have been con- 
ducted. 

Tremendous effort has been devoted to the 
characterization and classification of enzymes com- 
prising key metabolic pathways in microorganisms 
(Karp, et al, 1996; Karp & Paley, 1996; Selkov et al, 

1996) . However, the utilization of any exogenous 
substrate depends on the activities of cytoplasmic 
membrane transport systems that allow entry of 
the potential metabolite into the cell cytoplasm. 
Further, excretion of metabolic end products 
depends on such transport systems, and the path- 
ways for entry of a particular substrate may differ 
from those for exit (Kramer, 1994). The characteriz- 
ation and classification of transport proteins is 
unfortunately far less advanced than those of meta- 
bolic enzymes. 

Transport proteins are known to catalyze trans- 
membrane solute translocation by several distinct 
mechanisms (Mitchell, 1967a,b; Saier, 1998). Thus, 
solute-non-specific and solute-specific channels as 
well as highly stereospecific carriers have been 
extensively studied (Fischer et al, 1995; Konings 
et al., 1996; Lee, 1996). Moreover, a variety of 
energy coupling mechanisms are known to drive 
the active uptake and/or extrusion of specific com- 
poimds (Path & Kolter, 1993; Kramer, 1994; 
Konings et al, 1996; Paulsen et al, 1996a,b). The 
energy sources most commonly used include 
chemical energy in the form of ATP or phosphoe- 



nolpyruvate (PEP) and chemiosmotic energy in the 
form of a sodium or proton electrochemical gradi- 
ent, also known as the sodium or proton motive 
force (smf or pmf), respectively (Maloney, 1990, 
1992). However, other forms of energy are some- 
times utilized to drive transport (Dimroth, 1997; 
Saier, 1998). 

Transport proteins (porters) usually exhibit 
specificity for a restricted range of substrates 
(Clark & Amara, 1993; Poolman et al, 1996). Dis- 
tinct families of porters have been described on the 
basis of sequence similarities (Saier & Reizer, 1991). 
Within each family, phylogenetic analyses have 
revealed that substrate specificity frequently corre- 
lates with phylogeny (Saier, 1994, 1996, 1998). This 
observation implies that substrate specificity is 
often a well conserved evolutionary trait. Phyloge- 
netic analyses tlierefore provide a reliable basis for 
functional assignment. 

Cytoplasmic membrane transport proteins typi- 
cally consist of multiple membrane spanning a- 
helical segments (TMSs) connected by loop regions 
of various sizes (Kaback, 1986). Based in part on 
hydropathy analyses, and confirmed experimen- 
tally for several such systems, a surprising number 
of these proteins are believed to exhibit either six 
or 12 of these transmembrane segments (Yan & 
Malonev, 1990, 1992; Saier, 1994; Dean & 
Allikme'ts, 1995; Kuan et al, 1995; Pourcher et al, 
1996). While sequence comparisons indicate that a 
putative six TMS topological unit has evolved 
independently many times, tlie architectural basis 
for tills topological miiformity is not at present 
imderstood (Henderson & Maiden, 1990; Griffith 
et al, 1992; Maloney, 1990, 1992; Saier, 1998). 
Models suggesting specific three-dimensional 
topologies of these proteins have been proposed, 
but three-dimensional structural data are not yet 
available (Kaback, et al, 1997; Yan & Maloney, 
1993, 1995; Goswitz & Brooker, 1995; Le & Saier, 
1996). 

To date, few systematic and comprehensive ana- 
lyses of the transport capabilities of microorgan- 
isms have been reported (Clayton et al, 1997). In 
order to correct this deficiency, we have initiated a 
study to identify and characterize all existing cyto- 
plasmic membrane porters in those prokaryotic 
microorganisms for which extensive genome 
sequence data are available. In this capacity we 
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have systematically analyzed the complete geno- 
mic sequences of the eight prokaryotes cited 
above. Comparable analyses of the complete gen- 
ome of the eukaryote Saccharomyces cerevisiae are 
currently underway and will be the subject of a 
separate report (I.T.P., M.K.S. & M.H.S., Jr, impub- 
lished). These analyses have allowed us to identify 
members of more than 67 families of transport sys- 
tems encoded within these genomes (see Table 2 
and our web sites for the detailed analyses). 

Here, we present the categorization of these pro- 
teins according to their energy coupling mechan- 
isms, their families, their substrate specificities, and 
their transmembrane topologies. Tlie reported ana- 
lyses allow an overall evaluation of the transport 
capabilities of these organisms. They also allow 
estimation of the number of transport protein 
families yet to be identified and characterized in 
bacteria. Some of the unexpected findings are inter- 
preted and discussed. 

Results 

Overall approach for the identification of 
membrane transport systems 

To identify all of the known and putative cyto- 
plasmic membrane transporters encoded within 
the genome of each organism, we utilized an 
approach based on the fact that bacterial cyto- 
plasmic membrane transport systems characteristi- 
cally include at least one membrane protein which 
contains hydrophobic regions predicted to form 
multiple, transmembrane, hydrophobic, a-helical 
segments (TMSs; Konings et al, 1996; Saier, 1994, 
1996). We performed systematic hydropathy ana- 
lyses on all of the predicted protein products 
encoded within the genome of each organism ana- 
lyzed and predicted the likely number of TMSs in 
each protein (see Table 1 below). Proteins with one 
or more probable hydrophobic TMS(s) were then 
screened against the protein databases (see 



Materials and Methods) in order to identify func- 
tionally characterized homologs that would pro- 
vide an indication of the function of the test 
protein. Transport system families identified were 
subsequently classified on the basis of sequence 
similarities of protein members, bioenergetics, and 
substrate specificities. 

Hydropathy analyses 

Systematic hydropathy analyses of all of the 
recognized proteins encoded within the complete 
genomes of tlie organisms studied enabled predic- 
tion of membrane topologies for these proteins. An 
overall comparison of tlie relative abundances of 
membrane proteins of differing predicted topolo- 
gies for each organism is provided in Table 1. As 
can be seen, there is a similar distribution of pro- 
tein topological types in all of the organisms ana- 
lyzed. 

Overall, about 70% (65 to 77% for the different 
organisms examined; Table 1) of the encoded pro- 
teins are predicted to be soluble or peripheral 
membrane proteins while only about 30% are inte- 
gral membrane proteins. The organisms with the 
largest genomes exhibit the largest proportion of 
putative integral membrane proteins while the 
archaeon, M. jannaschii, exhibits the smallest pro- 
portion. 11 to 19% of the proteins, or about half of 
the integral membrane proteins, possess only one 
putative TMS. Most of these proteins probably do 
not serve a primaiy transport function although 
some of them clearly sen'e in auxiliaiy transport 
capacities (Dinh ct al., 1994; Paulsen rf al, 1997; see 
above). Many of the one TMS proteins are likely to 
possess cleavabic N-terminal signal sequences that 
target the protein to the periplasm or outer mem- 
brane (Saier et al, 1989). About 4 to 7% of the pro- 
teins exhibit two to three putative TMSs. Some of 
these proteins undoubtedly serve primary trans- 
port fimctions in the cytoplasmic membrane as is 
true for example of the MscL mechanosensitive ion 



Table 1. Comparison of relative abundances of proteins of differing predicted membrane topologies for eight bacteria 



Number of predicted TMS 


Organism 


litication) 


0 


1 2-3 4-6 


7-9 


5=10 



Escherichia coli 


2861 (66.8%) 


655 (15.3 %) 


220 (5.1%) 


211 (4.9%) 


153 (3.6%) 


182 (4.3%) 


(Gram-negative bacterium) 














Haemophilus influenzae 


1204 (72.0%) 


223 (13.3%) 


67 (4.0%) 


85 (5.1%) 


53 (3.2%) 


41 (2.4%) 


(Gram-negative bacterium) 














Helicobacter pylori 


1121 (71.1%) 


239 (15.2%) 


86 (5.5%) 


72 (4.6%) 


27 (1.7%) 


32 (2.0%) 


(Gram-negative bacterium) 














Bacillus subtilis 


2686 (66.8%) 


546 (13.6%) 


266 (6.6%) 


232 (5.8%) 


148 (3.7%) 


143 (3.6%) 


(Gram-positive bacterium) 














Mycoplasma genitalium 


318 (68.7%) 


72 (15.6%) 


24 (5.1%) 


24 (5.1%) 


17 (3.7%) 


8 (1.7%) 


(Gram-positive bacterium) 














Mycoplasma pneumoniae 


474 (70.0%) 


104 (15.4%) 


34 (5.0%) 


37 (5.5%) 


18 (2.7%) 


10 (1.5%) 


(Gram-positive bacterium) 














Synechocystis 


2058 (65.0%) 




214 (6.8"o) 


160 (5.1%) 


84 (2.7%) 


58 (1.8%) 


(Cyanobacterium) 














Methanococcus jannaschii 


1341 (77.3%) 


194 (11.2%) 


76 (4.4%) 


79 (4.6%) 


26 (1.5%) 


19 (1.1%) 


(Archaeon) 
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channel of E. coli (Hase et al, 1995; Sukharev et ah, 
1994, 1996) and protein translocating hoUns 
(Yoimg & Blasi, 1995). The remaining polytopic 
membrane proteins, possessing four or more puta- 
tive TMSs, include recognizable cytoplasmic mem- 
brane transport proteins (about half) as well as 
integral membrane receptors, electron carriers, and 
enzymes. 

Our analyses (see below) indicate that most 
established solute transport proteins possess six or 
more TMSs. As these proteins possess the lowest 
percentage of unique proteins or proteins of 
unknown function (mipublished observations), we 
conclude that tlie majority of transport proteins 
have functionally identified homologs. 

M. jannaschii was found to possess a larger per- 
centage of orphan proteins for almost all topologi- 
cal types compared with the other organisms 
examined (data not shown). This fact presumably 
reflects the meager amount of sequence and func- 
tional data available for the archaea as compared 
with bacteria and suggests that archaeal-specific 
families will be identified as additional sequence 
and functional data become available. 

Identification and classification of families of 
cytoplasmic membrane transport proteins 

Sequence comparison searches of the chromoso- 
mally encoded putative integral membrane pro- 
teins from each of the organisms examined 
enabled us to identify all homologs of functionally 
characterized and sequenced transport proteins. 
These proteins were classified on the basis of 
sequence similarities into 67 distinct families. The 
complete catalog of transporters according to 
organism and family (including function or func- 
tional prediction) as well as the catalog of all ident- 
ified transporter famUies are publically available 
on our World Wide Web sites (see Materials and 
Methods). It is our intention to maintain and 
update these sites in the future as additional geno- 
mic sequence data become available and as new 
functional information is published. Individuals 
wishing to provide corrections or additional infor- 
mation, particularly regarding novel functionally 
identified transporters, are encouraged to contact 
us via the e-mail addresses available through our 
WWW sites. 

Much of the information currently available 
through our web sites is summarized in Tables 2 
and 3. Table 2 presents the various transporter 
families, grouped according to transport mode and 
energy coupling mechanism. For each family, a 
family designation, its abbreviation, its transport 
commission (TC) catalog number, and a represen- 
tative, well characterized member are provided. 
The Table also includes a compilation of typical 
substrates transported by the protein members of 
the family and an indication of the kingdoms (bac- 
terial (B), eukaryotic (E) and archaeal (A)) in which 
members of the family have been found. A refer- 
ence is cited which provides a phylogenetic 



description of the family (when available) or pro- 
vides a description of either the family or a well 
characterized member of the family. 

Bacterial channel proteins 

Channel proteins (class I families) include mem- 
bers of the large MIP family of aquaporins and gly- 
cerol facilitators (Park & Saier, 1996) as well as 
members of several ion channel protein families. 
Members of both voltage-sensitive and mechano- 
sensitive ion channel famiKes are present in the 
bacteria under study, and both cation-selective and 
anion-selective channels are represented 0entsch 
et al., 1995; Konings et al., 1996). Except for the 
small prokaryotic-specific mechanosensitive ion 
channels (Sukharev et al, 1994, 1996; Hase et al, 
1995), these families are represented in eukaiyotes 
as well as bacteria. While all of the bacteria exam- 
ined possess ion charmel proteins, members of no 
channel protein family are found in all of these 
organisms. 

Secondary carriers 

Facilitators and secondary active transporters 
(class II families) represent the largest and most 
diverse category of transporters. Of the 67 families 
included in Tables 2 and 3, 45 fall into this cat- 
egory. The largest of these families are the ubiqui- 
tous major facilitator superfamily (MFS; 
Henderson & Maiden, 1990; Griffith et al, 1992; 
Marger & Saier, 1993; Essenberg et al, 1997; 
Goswitz & Brooker, 1993, 1995; Pao et al, 1998) 
and the ubiquitous amino acid-polyamine cholii-\e 
(APC) family (Reizer et al, 1993). While hui-idreds 
of MFS and dozens of APC transporters have been 
sequenced, relatively few members of any one of 
the other secondary transporter families rep- 
resented have been sequenced. These families typi- 
cally include from 1 to 20 currently sequenced 
members. 

Tlie secondaiy transporters listed in Table 2 fall 
into three general groups: (1) the proton motive 
force (pmf)-driven group of solute:proton sympor- 
ters; (2) the sodium motive force (smf)-driven 
group of solute:Na+ symporters, and (3) the so- 
called metal ion exchangers. Families with a pre- 
ponderance of pmf-dependent permease members 
may also u-\clude members that catalyze sodium 
symport, imiport and/or solute:solute antiport. 
Similarly, select members of a family of smf-depen- 
dent permeases may catalyze pmf-dependent 
transport. Thus, pmf and smf-dependent systems 
seem to have undergone facile interconversion 
throughout evolutionary history. Finally, the metal 
ion exchangers may also catalyze uniport or sym- 
port with protons in addition to metal ion:proton 
antiport. 

Examination of the occurrence of members of the 
45 secondary facilitator families represented in 
Table 2 reveals that of them, 21 (48%) are bacterial- 
specific. Of these, six (14%) are foimd exclusively 
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Table 3. Distribution of constituent members of transport protein families according to organism 



















































































































11. Secondary active transporters 
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III. Primary active transporters 
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Table 4. Number of membrane transport systen 


IS compared with genome 


size 






No. of transport 


% genes encoding 


Organism 


Genome size (Mb) 


systems/100 kb 


transport proteins 


E. coli 


4.60 


6.2 


10.8 


H. influenzae 


1.83 


4.8 


9.8 


H. pylori 


1.67 


3.2 


5.4 


B. subtilis 


421 


5.6 


9.7 


M. genitalium 


0.58 


3.7 


10.2 


Synechocystis PCC6803 


3.57 


2.6 


4.7 


M. jannaschii 


1.66 


2.2 


3.5 



in Gram-negative bacteria, but only one (2%) is 
found exclusively in Gram-positive bacteria. No 
family revealed by the genome analyses reported 
here is specific to the archaea or cyanobacteria. 
However, more comprehensive analyses of archae- 
al sequences have revealed two uniquely archaeal 
transporter families (unpublished results; see our 
second WWW site). 

All but seven of the 45 secondary transporter 
families found encoded within the genomes of the 
seven prokaryotes analyzed are foimd in £. coli, 
and two of tlie seven families lacking from £. coli 
are represented in H. influenzae (Table 3). Thus, 
most of the secondary transporter families foimd 
in prokaryotes are represented by systems encoded 
within the £. coli genome and that of its close rela- 
tive, H. influenzae. Four families (9%) are found in 
both bacteria and archaea but not in eukaryotes, 
while seven families (16%) are represented in both 
bacteria and eukaryotes but not archaea. Finally, 
12 of these secondary carrier families (27%) are 
found in all three of the kingdoms of life. Overall, 
we see that 24 of the families of secondary facilita- 
tors represented in Table 2 (SS'X.) have not been 
found in eukaryotes, while 29 of these families 
(66%) have not been found in archaea. 

ATP-driven primary active transporters 

In the seven prokaryotic organisms examined, 
only four families of solute transporters include 
members that are knovm to function as primary 
active transporters using ATP hydrolysis as a 
mode of energy coupling (class 111 families). Tliese 
families include: (1) the ABC superfamily, capable 
of transporting an extremely diverse group of sub- 
stances with inwardly directed or outwardly 
directed polarity; (2) the P-type ATPases which cat- 



alyze proton, alkali and heavy metal ion transport 
witli either inwardly or outwardly directed 
polarity (these enzymes are the only ones known 
to be autophosphorylated); (3) F, V, and A-types of 
ATPase multisubunit complexes that catalyze the 
reversible electrogenic transport of H+ or Na+ at 
the expense of ATP hydrolysis, and (4) ArsAB-type 
arsenite/antimonite exporters which function 
superficially as do members of the ABC superfam- 
ily. It should be noted that the ArsA/B-type trans- 
porters may also be able to function as secondary 
transporters in the absence of their ATP binding 
proteins (Dey & Rosen, 1995). All four of these 
families are ubiquitous, being found in bacteria, 
archaea and eukaryotes. Oflier ATP-dependent 
transport systems that translocate macromolecules 
(e.g. proteins, nucleic acids and complex carbo- 
hydrates) were not included in this study. 

Group translocators 

Group translocators (class IV families), which 
phosphorylate their substrates during transport, 
fall into the functional superfamily of phospho- 
transferases (members of the bacterial-specific 
phosphotransferase system (PTS); Postma et ah, 
1996). Sequence comparisons group these transpor- 
ters into six phylogenetic classifications that corre- 
late with substrate specificity. These six 
phylogenetic groups or families include: (1) the 
glucose and glucoside (Glc)-specific family, (2) 
the fructose and mannitol (Fru)-specific family, 

(3) the lactose and cellobiose (Lac)-specific family, 

(4) file glucitol (Gut)-specific family, (5) the galacti- 
tol (Gat)-specific family, and (6) the mannose- 
fructose-sorbose (Man)-specific family. These 
multicomponent transporters consist of three or 
four proteins or protein domains (IIA, B and C, 



Table 5. Comparis( 
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and sometimes D), which clearly evolved from a 
variety of protein sources (Saier & Reizer, 1992, 
1994). At least some of these different families 
apparently evolved independently of each other. 
Thus, the IIA constituents of the Glc and Fru sys- 
tems evolved from different sources and have com- 
pletely different three-dimensional structures, but 
the lie constituents are homologous. By contrast, 
all constituents of the Man family evolved inde- 
pendently of those of the other five families. For 
these reasons, the PTS is said to consist of 
"mosaic" transport systems with a unified mech- 
anism and function, but without a unified structure 
or evolutionary source (Saier & Reizer, 1994). 

All of the PTS transporters utilize phosphoenol- 
pyruvate (PEP) as both the energy source for trans- 
port and the phosphoryl donor for sugar 
phosphorylation. The general energy coupling pro- 
teins (Enzyme I and the heat stable phosphocarrier 
protein, HPr) pass the phosphoryl group from PEP 
to the sugar-specific IIA and IIB domains or pro- 
teins in preparation for group translocation cata- 
lyzed by the IIC constituent of the Enzyme II 
complex (Saier & Reizer, 1992, 1994). To date, no 
PTS homolog has been identified in an archaeon or 
a eukaryote (J. Reizer and M.H.S., Jr, impublished 
observations). The results show that not only the 
archaeon, M. jannaschii, lacks genes homologous to 
PTS-encoding genes, but that the blue green bacter- 
ium, Synechocystis, and the Gram-negative bacter- 
ium, H. pylori, also lack such genes. 

Transporters of unknown classification 

Several families are listed among class V families 
(families of unknown classification), because, 
although their transport functions are established, 
their modes of transport or energy coupling mech- 
anisms are not. Many of these families are likely to 
prove to fall into the class II group of secondary 
active transporters, but a few may prove to be 
members of class I (channel-type) or class III (ATP- 
dependent). Finally, additional poly topic proteins 
may be transporters, but no evidence for this fact 
is currently available because of a lack of function- 
ally characterized homologs. Proteins in these 
families are not included in Tables 2 and 3 and can 
only be identified when functional data become 
available. 

Transporter family representation in 
seven prokaryotes 

Table 3 provides the numbers of transport sys- 
tem paralogs for each of the 67 families encoded 
within the genomes of the seven organisms exam- 
ined. Interestingly, with the exception of M. jan- 
naschii and Synechocystis PCC6803, the total 
number of transport systems per organism is 
approximately proportional to genome size despite 
the tremendous disparities in tihe sizes of the gen- 
omes analyzed (eight-fold; see Table 4). Thus, there 
is less than two-fold variation among the five 



eubacteria examined and less than three-fold vari- 
ation when all seven organisms are considered. 
This fact is emphasized by the results summarized 
in Table 4 which presents the number of transpor- 
ters in each organism per 100 kb of genomic DNA. 
As can be seen, the five eubacteria under study 
possess between 3.2 and 6.2 transporters per 100 
kb. M. jannaschii and Synechocystis PCC6803 exhibit 
approximately twofold lower values (2.2 to 2.6). 
This last observation we believe reflects both the 
ecological niches and the metabolic capabilities of 
the latter bacteria which live in nutrient poor mar- 
ine and fresh water environments, respectively, 
and derive their energy primarily from metabolism 
of exogenous inorganic compomids (Rippka et ah, 
1979; Bult et al, 1996; Kaneko et al, 1996). 

Comparing the Gram-negative H. influenzae (1.83 
Mbp) or H. pylori (1.67 Mpb) with £. coli (4.60 
Mbp), or the Gram-positive M. genitalium (0.58 
Mbp) with B. suhtilis (4.21 Mbp), it is clear that the 
increased numbers of transport systems in organ- 
isms with large genomes are due both to increased 
numbers of paralogs within the representative 
large families (e.g. ABC and MFS) and to increased 
representation of smaller families. The increase in 
small family representation is comparable to the 
increase in large family representation both in the 
Gram-positive bacterial pair and in the three 
Gram-negative bacteria examined. 

Distribution of transporter types in the 
different microorganisms 

Comparison of the numbers of transporters 
within each of the four energy coupling categories 
included in Table 3 reveals dramatic differences 
among the various organisms examined (Table 5). 
Thus, £. coli and B. suhtilis possess comparable dis- 
tributions of tiansporter t3^es, with emphasis on 
secondary active transporters, but H. influenzae, H. 
pylori and M. genitalium show a relative increase in 
ATP-dependent systems, particularly ABC-type 
transporters. Tliis tendency is particularly pro- 
nounced in M. genitalium which has the smallest 
genome and lacks both electron transport and a 
functional TCA cycle (Eraser et al, 1995). This last 
observation accounts for the reliance of M. genita- 
lium on primary active transport as this organism 
derives its energy exclusively from substrate level 
phosphoiylation. 

Examination of M. jannaschii, the only archaeon 
represented, reveals a pattern quite different from 
those observed for the bacteria. First, there is a 
greater reliance on ATP-dependent systems, par- 
ticularly ABC-type systems, compared to £. coli or 
B. suhtilis (Tables 3 and 5). Second, there are no 
PTS group translocators. The PTS at present 
appears to be eubacterial specific as no such 
sequences have been revealed during extensive 
sequencing of archaeal and eukaryotic genomes 0. 
Reizer and M.H.S., Jr, unpublished observations). 
Third, putative cation-selective and anion-selective 
channels are both present in M. jannaschii, a situ- 
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ation not observed for the other sequenced small 
genome organisms. Finally, M. jannaschii exhibits a 
larger percent of putative transporters of imknovm 
function than any of the bacteria except Synechocys- 
tis 

Synechocystis PCC6803 resembles M. genitalium 
in exhibiting a much greater proportion of ATP- 
dependent transporters relative to secondaiy trans- 
porters. Like H. pylori and M. jannaschii, it lacks 
PTS group translocators. However, it possesses the 
full complemerit of channel proteins found in 
E. coli. These channels include a MIP family protein 
that may function in Cu^+ homeostasis (Kashiwagi 
et al., 1995; Park & Saier, 1996), a putative mechan- 
osensitive ion channel protein, and putative cation 
and anion-selective ion channels (Table 3). 

H. influenzae shows some interesting tendencies 
relative to its close relative, E. coli. First, it exhibits 
an increased proportion of ABC-type permeases 
relative to secondary transporters as noted above. 
Second, it shows a dramatic decrease in the pro- 
portion of pmf-dependent secondary transporters 
but a large increase in the proportion of metal ion 
exchangers, and a particularly large increase in the 
proportion of sodium-dependent nutrient uptake 
systems. Finally, there is a marked decrease in the 
proportion of PTS-type systems. We expect that 
these observations reflect multiple ecological and 
metabolic considerations. For example, H. influen- 
zae may exist primarily in a high salt, carbohydrate 
poor medium where many nutrients are present in 
low concentrations (Macfadyen & Redfield, 1996). 
High salt environments permit the use of Na"*"- 
symport which can be more efficient than H+ sym- 
port due to the greater inherent permeability of 
biological membranes to protons than to sodium 
ions. Carbohydrate deficiency would minimize the 
benefit of possessing an extensive PTS, and low 
nutrient concentrations would render high affinity 
ABC-type transport systems of benefit. Addition- 
ally, H. influenzae lacks a complete Krebs cycle, and 
like many other Gram-negative and Gram-positive 
bacteria, these organisms use glycolysis only for 
fructose catabolism as phosphofructokinase is lack- 
ing. This latter fact explains why only fructose is 
transported via the PTS. 

The distribution of transporter types in H. pylori 
closely resembles that for H. influenzae except that 
the PTS is lacking. This last fact indicates that 
carbohydrate catabolism is of little importance in 
this organism, a suggestion substantiated by the 
data summarized in Table 7. H. pylori appears to 
have only one carbohydrate-specific transporter 



and may depend on amino acids and carboxylates 
as sources of carbon. 

Relative distribution of ABC- and iUlF 
superfamiiies in seven proloryotes 

Table 6 summarizes the distributions of ABC- 
type and MFS-type transporters encoded within 
the genomes of the seven prokaryotes analyzed. As 
is apparent from the data presented, there is a tre- 
mendous difference in the relative proportions of 
the members of these two superfamiiies, depend- 
ing on organism. Thus, at one end of the scale, 
B. subtilis and E. coli encode within their genomes 
more MFS porters than ABC permease systems, 
while M. genitalium and Synechocystis encode more 
than tenfold more ABC transport systems than 
MFS permeases (Table 6). The other bacteria ana- 
lyzed exhibit intermediate proportions of these two 
transporter types. The distributions of ABC and 
MFS transporters in the various organisms quali- 
tatively (but not quantitatively) correlate with their 
relative reliances on ATP and pmf-dependent 
transporters, respectively. Surprisingly, the fraction 
of all encoded transporters that are members of 
these two superfamiiies is almost invariant (0.38 to 
0.53 for the seven prokaryotes examined; Table 6). 
This unexpected observation has yet to be 
explained. 

Comparison of substrate specificities for 
cytopiasmic membrane transporters 

Table 7 summarizes the distribution of transpor- 
ters in the seven organisms analyzed according to 
class of compound transported. The ranges of 
transporter substrate specificities among the five 
eubacteria examined are strikingly similar. All of 
these bacteria can take up a wide range of organic 
substances as well as inorganic cations and anions. 
Drugs appear to be actively effluxed from them all. 
Interestingly, the percent of transporters that prob- 
ably catalyze drug efflux is highest for the non- 
pathogenic soil bacterium, B. subtilis (17.9%), 
although the pathogen, H. pylori, comes in a close 
second (15.1%). Tlie high percentage of drug trans- 
porters encoded withii-\ Ihe genome of B. subtilis 
may render it effective in competing for growth 
and survival in the presence of other antibiotic pro- 
ducing soU microorganisms. 

M. jannaschii and Synechocystis are fundamen- 
tally different from the eubacteria with respect to 
the substrate specificities of their transporters. A far 
greater percentage of their transporters recognize 
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inorganic ions (47% relative to 23% for the eubac- 
teria). This fact was also noted by Clayton et al. 
(1997). Only a small number of transporters in M. 
jannaschii and Synechocystis appear to be involved 
in carbon source uptake (3 to 6% relative to an 
average of about 22% for the eubacteria). These 
differences presumably reflect the ecological niches 
and metabolic capabilities of M. jannaschii and 
Synechocystis relative to those of the other organ- 
isms analyzed. It should be noted that M. jannaschii 
and Synechocystis possess greater proportions of 
functionally unidentified transporter homologs, 
suggesting the existence of novel types of transport 
systems that are as yet uncharacterized. 

Transporters of E. coli and B. siibtilis exhibit simi- 
lar ranges of specificity. H. influenzae is similar in 
its transport capability to E. coli but exhibits an 
increased proportion of organic anion porters with 
a corresponding decrease in carbohydrate porters. 
Metal ion homeostasis may be of particular import- 
ance in this organism as an increased proportion of 
metal ion transporters (Table 7) correlates with the 
high proportion of solute:Na+ {versus H+) sympor- 
ters (see Table 3). 

M. genitalium possesses only a small number of 
transporters, reflective of its small genome size, 
and it exhibits a higher percentage of unknown 
transporters than tlic other eubacteria examined 
(Table 7). Conclusions based on comparisons with 
this organism should be treated with caution. 
M. genitalium has an abbreviated array of mem- 
brane transporters with most potentially non- 
essential permease systems eliminated. The 
absence of carboxylate transporters correlates with 
its lack of a Krebs cycle and an electron transfer 
chain. However, the absence of recognizable trans- 
porters for precursors of nucleic acids suggests that 
one or more of the unrecognized transport systems 
must transport these substances. 

Discussion 

Evolutionary considerations 

This paper represents the first comprehensive 
attempt to identify and classify all recognizable 
transporters encoded within the sequenced gen- 
omes of prokaryotic organisms. Analyses of seven 
genomes have enabled us to identify 67 distinct 
families of transport systems based on primary 
structure. These families include large, well charac- 
terized families such as the major facilitator (MF) 
and ATP-binding cassette (ABC) superfamiUes, but 
they also include many smaller, hitherto unrecog- 
nized families. These small families exhibit a large 
spectrum of properties and organismal distri- 
butions. For example, while some (about half) are 
restricted to bacteria and exhibit extensive 
sequence conservation, others are widely distribu- 
ted in prokaryotic and eukaryotic phyla (42%) and 
exhibit tremendous sequence diversion. On this 
basis, we estimate that some of the latter families 
are probably ancient families that date back over 



three biUion years. Nevertheless, some of these 
ancient families are very restricted with respect to 
their substrate specificities. Thus, proteins of the 
CDF family (TC no. 2.4) transport only divalent 
cations; those of the CaCA family (TC no. 2.19) are 
apparently specific for Ca^"'^; those of the Pit family 
(TC no. 2.20) are specific for inorganic phosphate, 
and those of the Amt family (TC no. 2.49) transport 
only NH4 . It is noteworthy that most of the ubiqui- 
tous families with narrow substrate specificities 
transport inorganic ions. However, some families 
which include members specific for inorganic ions 
such as the MF, RND and ABC families (TC nos 
2.1, 2.6 and 3.1, respectively) transport many other 
substrates in addition to inorganic ions. Other 
families are specific for restricted types of organic 
substrates (e.g. the RhaT, KdgT and LctP families; 
TC nos 2.9, 2.10 and 2.14, respectively), but these 
small families have been found only in a limited 
range of organisms (see Table 2). Whether these 
families have recently appeared or have ancient 
roots is an interesting question that may become 
answerable when more sequence data become 
available. 

Architecture and evolutionary potential of 
transport proteins 

The two largest transporter families foimd in 
nature, the ABC superfamily and the MFS, are 
particularly interesting with respect to their 
remarkable substrate ranges. Thus, proteins of the 
MFS are capable of trarisporting almost all types of 
low molecular weight compounds (e.g., sugars, 
drugs, inorganic and organic anions and cations, 
etc.). However, the ABC superfamily can transport 
all of these substrates as well as macromolecules 
such as proteins, complex carbohydrates and 
Upids. To date, no member of the MFS has been 
found to transport a molecule of >1000 Da size. 
We suspect that this difference in substrate range 
may be attributable to the dimensions and flexi- 
bility of the transmembrane permease channels. 
Thus, the charmels of ABC permeases may be 
more flexible and accommodating than those of 
the MFS permeases. The constituent a-heUces may 
be able to "breathe," and the subvinits may even 
dissociate from one another. We expect that these 
two permease types will prove to exhibit very 
different chara-\el architectures. 

The broad specificities and variable polarities 
observed for members of the ABC and MF superfa- 
mOies presumably account for the large numbers 
of paralogs foimd within a single bacterium. E. coli, 
for example, has 64 MFS and 63 ABC transport 
systems, and these £. coli paralogs exhibit nearly 
the entire gamut of substrate specificities 
represented within these two superfamilies in all 
living organisms (see our web sites for descriptions 
of the individual permease systems). 

Some smaller families, such as the AFC (TC no. 
2.3) and SSS (TC no. 2.21) families include mem- 
bers that exhibit an intermediate range of substrate 
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specificities. For example, APC family members 
transport amino acids, polyamines and choline by 
proton symport and/or by solute:solute antiport, 
while the SSS proteins transport amino acids, 
sugars, vitamins, nucleosides and inorganic anions, 
all by Na+ symport. The variability in the substrate 
specificities of APC family permeases is easy to 
rationalize as all of the substrates are structurally 
related. However, the same is not true of tlie SSS 
permeases. In this latter case, the only common 
feature appears to be recognition and cotransport 
of Na+. The variability of the substrates recognized 
by the different SSS family members (e.g. proline, 
glucose, pantothenate, adenosine, iodide, etc.) is 
difficult to explain. Although this family is wide- 
spread and diverse in substrate specificity, it 
remains a relatively small family with few cur- 
rently sequenced members and much less sequence 
diversion than, for example, the MPS. This limited 
distribution correlates with the xmiformity exhib- 
ited by its members with respect to polarity of 
transport and the nature of the cotransported cat- 
ion. We anticipate that SSS family proteins will 
prove to exhibit a channel architecture that is 
different from that of either MPS or ABC-type 
transport systems. 

Phylogeny and substrate specificity 

As noted above, the SSS appears to be an excep- 
tional family. A primary theme revealed by our 
analyses of permease families is the striking corre- 
lation of phylogeny with substrate specificity. This 
observation clearly suggests that substrate speci- 
ficity has been a well conserved trait throughout 
evolutionary history, almost irrespective of the 
evolving family. At first glance this fact seems to 
be at odds with recent experimental work showing 
that single amino acid substitutions can dramati- 
cally alter the substrate specificity and even the 
polarity of transport. For example, a single amino 
acyl substitution in bacteriorhodopsin of Halobac- 
terium salinarium (Asp 85 Thr) changes the pro- 
tein from a proton pump to a chloride pump of 
oppoosite polarity (Sasaki et ah, 1995). Moreover, 
single amino acid substitutions in the lactose 
permease of £. coli can change its specificity for 
P-galactosides so that it can transport arabinose, 
a-glucosides or P-glucosides, depending on the 
substitution (Brooker & Wilson, 1985; Collins et al., 
1989; King & Wilson, 1990; Olsen et al, 1993; see 
Varela & Wilson, 1996 for a current review). One 
final example is the QacA drug resistance pump of 
StapJn/lococcus aureus which dramatically changes 
its range of substrates when specific point 
mutations are introduced (Paulsen et al, 1996a,b). 

On the basis of these observations, it would 
seem that changing substrate specificity is a simple 
matter. However, in none of these examples do the 
mutations change the class of compound recog- 
nized. It would be most interesting to conduct 
comparable studies with members of families that 
exhibit invariant specificities (e.g. the Amt family 



transporters, all of which transport NH4 , or the 
CaCA family, all of which transport Ca^+) to see if 
these permeases can also change their specificities 
upon introduction of point mutations. Parallel stu- 
dies with the SSS family, which exhibits unusually 
broad substrate specificity but narrow cation speci- 
ficity would be of further interest. 

Energy availability, environmental factors and 
transport mode 

We have found that some organisms exhibit a 
preponderance of pmf-driven transporters while 
others exhibit a preponderance of ATP-driven per- 
meases. In some cases, this difference seems to be 
explained by the type of energy source most 
readily generated by the organism. M. genitalium 
generates ATP as its primary energy source and 
has mostly ATP-dependent permeases, while 
B. subtilis generates a pmf as its primary source of 
energy and exliibits the opposite tendency 
(Table 3). Otlier organisms are less readily rational- 
ized. Thus, E. coli and Synechocystis can generate 
both ATP and a pmf as primary energy sources, 
but while £. coli exhibits a 2.5-fold excess of pmf- 
driven permeases over ATP-driven permeases, 
Synechocystis exhibits a 2.5-fold excess of ATP-dri- 
ven systems. In these cases, the environments in 
which these organisms flourish may be important. 
Thus, while £. coli is an intestinal bacterium where 
nutrients are often present at high concentrations, 
Synechocystis is a freshwater organism living in 
nutrient-poor surroundings (Rippka et al, 1979). 
The latter organism may require the presence of 
high affinity transporters to sur\'ive and grow, 
while low affinity uptake systems may suffice for 
£. coli. ATP-dependent nutrient transporters in 
general transport their solutes with much higher 
affinities than do pmf-driven systems. It should 
also be noted, however, that Synechocystis pos- 
sesses an intracellular thylakoid membrane system 
that is not continuous with the plasma membrane. 
Tlie generation of a pmf across the thylakoid mem- 
brane may not allow pmf-dependent uptake of 
nutrients across tlie cytoplasmic membrane. 

Ubiquity of transport protein families 

Only four transport protein families are rep- 
resented m all seven of the organisms we have 
examined. These families are the MP (TC no. 2.1), 
APC (TC no. 2.3), ABC (TC no. 3.1) and F-type 
ATPase (TC no. 3.2) families. F-type ATPases prob- 
ably have the same function in all living organ- 
isms: they interconvert chemical and chemiosmotic 
energy forms. However, examination of the protein 
members of the other three ubiquitous families 
shows that their specificities differ dramatically, 
and that no single solute-specific permease is pre- 
sent in all seven organisms. This fact stresses the 
transport variability observed for different micro- 
organisms. Characterization of an organism's 
complement of transport proteins should reveal 
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many important aspects of its lifestyle: its preferred 
exogenous sources of energy and nutrition, its 
metabolic pathways, and its end products of 
metabolism. The realization of this goal will 
require extensive physiological, molecular genetic 
and biochemical analyses of permease function. 

Transport capacity and cellular 
metabolic capability 

M. genitalium is an organism with almost no bio- 
synthetic potential. It must acquire all precursors 
for the biosynthesis of proteins, nucleic acids, lipids 
and complex carbohydrates from exogenous 
sources. One might expect tliat a greatly increased 
proportion of the genes of such an organism 
would encode transport proteins. In the case of 
M. genitalium, this postulate proved to be untrue. 
The same percent of the genomes of M. genitalium 
and £. coli encode recognized permeases (10.2% 
and 10.8%, respectively; Table 4) although M. geni- 
talium has only 7% as many genes as does E. coli. 
This observation clearly suggests that many of the 
transport systems of M. genitalium must differ in 
specificity from those of E. coli, being capable of 
transporting many related, essential compounds. 
We estimate that M. genitalium has a single proton- 
translocating F-type ATPase (TC no. 3.2), a single 
Ca2+-translocating P-type ATPase (TC no. 3.3), a 
single K+:H+ antiporter of the Trk family (TC no. 
2.38) and two inorganic anion transporters of the 
ABC family (TC no. 3.1), one specific for phos- 
phate, the other of unknown specificity. These five 
permeases presumably suffice to maintain the pmf 
and proper cytoplasmic ionic balance. 

M. genitalium devotes a major fraction of its 
transport capacity to the task of sugar uptake for 
the purpose of carbon and energy acquisition. 
Thus, it possesses: (1) both glucose and fructose 
phosphotransferase systems (TC no. 4.1 and TC 
no. 4.2, respectively; Reizer et ah, 1996c); (2) two 
sugar permeases of the ABC superfamily, one of 
which probably transports oligosaccharides (TC 
no. 3.1.1) while the other transports monosacchar- 
ides (TC no. 3.1.2; Tarn & Saier, 1993), and (3) a 
non-specific channel protein of the MIP family (TC 
no. 1.1), probably capable of transporting small, 
neutral, straight chain molecules such as urea, gly- 
cerol and various polyols (Park & Saier, 1996). As 
M. genitalium lacks a TCA cycle and electron flow, 
it is possible that it lacks specific transporters for 
organic carboxylates. However, even tiiese com- 
pounds may be transported by non-specific 
permeases. 

M. genitalium takes up amino acids via two APC 
transporters (TC no. 2.3), and it probably accumu- 
lates peptides and polyamines via two distinct 
ABC-type systems. These four permeases presum- 
ably provide the requisite precursors for protein 
synthesis. However, we did not identify permeases 
l&ely to accumulate vitamins or the precursors of 
lipid and nucleic acid biosynthesis. At least three 
transport systems of unlaiown specificity (two 



ABC-types and one MFS-type) were identified, and 
these, or broad specificity sugar, amino acid or 
peptide permeases, may provide these functions. 

Finally, M. genitalium possesses three transport 
systems that are likely to function in the active 
extrusion of drugs and other toxic substances 
(Table 7). These transport systems may serve pro- 
tective roles and also allow efflux of metabolic end 
products. Examination of our web sites allows the 
reader to dissect the transport capabilities of the 
other bacteria analyzed. The permease complement 
of an organism presumably reflects its metabolic 
capabilities as well as the environmental conditions 
mider which the organism evolved. 

Protein topology and functional assignment 

Our topological analyses suggested that in each 
organism studied, proteins of one to three trans- 
membrane a-helical spanners (TMSs) possess fewer 
identifiable homologs than proteins of 0 or ^4 
TMSs (impublished results). This observation pre- 
sumably relates to the functions of these proteins. 
We suggest that a large percentage of these pro- 
teins primarily serve structural functions, and that 
specific amino acyl residues within these proteins 
are, in general, far less critical to function than are 
those of soluble enzymes (0 TMS) and catalytic 
transport proteins (usually >4 TMSs). Lack of con- 
straint for sequence divergence can explain our 
unpublished observations. 

Conclusions and perspectives 

The results described here summarize analyses 
tliat substantially extend our understanding of the 
transport capabilities of microorganisms and the 
evolution of transport proteins. Further molecular 
genetic, biochemical and physiological analyses 
will be required to identify the functions of the per- 
meases and their families that are as yet uncharac- 
terized. Further genome sequencing projects 
should additionally reveal the distributions of the 
various types of transport systems in the living 
world. Such studies together with three-dimen- 
sional analyses will ultimately provide an under- 
standing of the evolutionary origins and 
mechanistic details of transport systems of all 
types. 

Materials and Methods 

The complete protein inventories from each organism 
were obtained from the following sites: H. influenzae, M. 
genitalium, H. fyrlori and M. jannaschii (http://www.ti- 
gr.org/tdb/mdb/mdb.html; Fleischmann et at., 1995; 
Fraser et d., 1995; Bult et at., 1996; Tomb et al, 1997), 
E. coli (http://www.genetics.wisc.edu:80/index.html; 
Blattner et al., 1997), and Synechocystis PCC6803 (http:// 
www.kazusa.or.jp/cyano/cyano.html; Kaneko et al., 
1996). The B. subtilis sequences analyzed were obtained 
courtesy of A. Danchin prior to release of the complete 
NRSub database (Biaudet et al., 1996; Kunst et al., 1997; 
http://www.pasteur.fr/bio/SubtiList.html). The com- 
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plete genomic sequence of /VI. piicutiioiiiac (Himmelreich 
et al, 1996) revealed a complement of transporters simi- 
lar to that of M. genitalium, and analysis is consequently 
not presented here. Analyses of the S. cerevisiae genome 
(Horak, 1997) will be reported elsewhere (unpublished). 

Automated hydropathy analyses on the complete 
complement of proteins encoded within each genome 
were performed using a modified version of the program 
MEMSAT (Jones et al, 1994, modified by us). Due to 
limitations of the program, proteins which were less 
than 50 amino acids in length were excluded from the 
analyses, and proteins in excess of 1000 amino acids 
were truncated. Additional hydropathy analyses were 
performed using the programs TOP-PRED (Claros & von 
Heijne, 1994) and TMpred (Hofmann & Stoffel, 1993). 
Based on these analyses, the number of TMSs in each 
protein analyzed was predicted. 

All of the proteins, which contain >1 predicted TMS 
encoded within each genome, were screened against the 
databases for sequence similarities with known and 
putative transport proteins. Database searches were 
initially performed using BLAST (Altschul et al, 1990). 
When no signific^int sequence similarity was detected, 
additional searches were performed using FASTA 
(Liprnan & Pearson, 1985) and BLITZ (Sturrock & 
CoUins, 1993). The significance of sequence similarities 
detected was confirmed using the programs RDF2 
(Pearson & Liprnan, 1988) and GAP (Devereux el al., 
1984) with 200 random shuffles. A comparison score of 
nine standard deviations obtained with either one of 
these programs was considered sufficient to establish 
homology (Doolittle, 1986; Saier, 1994). Based on these 
analyses, the potential membrane transporters identified 
were classified according to sequence similarities into 
separate families of homologous proteins. 

Identification of transporters from each organism was 
cross checked against the protein assignments from the 
genome sequencing projects and from earlier relevant 
studies (e.g. Riley, 1993; Riley & Lebelan, 1996). 
Additionally, representative members of each prokaryo- 
tic family of transporters and of various eukaryotic- 
specific transporter families were used to rescreen the 
genomes of each organism translated in all six reading 
frames. Auxiliary proteins and hydrophilic components 
of particular transport systems were subsequently ident- 
ified by database searches. Functional predictions for 
each putative transport protein were made whenever 
possible based on sequence and phylogenetic analyses of 
the families of transporters. Phylogenetic trees were 
constructed using the TREE (Feng & Doolittle, 1990) and 
PILEUP (Devereux et al, 1984) programs. For each 
organism, the family classifications and functional 
identifications/predictions for the encoded transport 
proteins are available at our WWW sites (http://www- 
biology.ucsd.edu/ ~ ipaulsen/ transport/ titlepage.htm); 
(http://www-biology.ucsd.edu/ ~ msaier/trms-port/ti- 
tlepage.html). 

It should be noted that the analyses reported here do 
not include: (1) outer membrane transport proteins 
Qeanteur et al, 1991); (2) proteins such as those of the 
E. coli TonB/ExbB/ExbD complex which transduce 
energy from the inner to the outer membrane to drive 
outer membrane transport processes (Braun et al., 1994); 
(3) proteins involved in most protein secretory pathways, 
e.g. the E. coli Sec proteins (Saier et al., 1989); (4) proton 
and sodium ion-translocating electron transfer proteins 
(Germis & Stewart, 1996); (5) Na^-transporting car- 
boxyUc acid decarboxylases (Dimroth, 1997); (6) ion- 



translocating flagellar motor (Mot) proteins (Nguyen & 
Saier, 1996); and (7) proteins involved in competence for 
DNA uptake (Dubnau, 1991; Macfadyen et al, 1996). 
Many of these proteins are, however, included in the glo- 
bal classification of transport proteins included in the 
second of our two web sites cited above and to be pub- 
lished elsewhere (Saier, 1998). Auxiliary transport pro- 
teins, such as proteins from the MFP family (Dinh et al, 
1994) or the membrane-periplasmic auxiliary proteins of 
the MPAl and MPA2 families (Paulsen et al, 1997) were 
not considered as separate transport systems, but were 
treated as components of the transporters with which 
they function. 
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